Parallel processing data transfer arrangements

ABSTRACT

A data transfer arrangement may be used in a System on a Chip (SoC). The SoC has a processing element fabric and a logic element fabric. The two fabrics are coupled by a fabric exchange element to transfer data efficiently between the processing element fabric and the logic element fabric to facilitate parallel processing.

TECHNICAL FIELD

The present subject matter pertains to parallel processing arrangementsand, more particularly, to data transfer among parallel processingarrangements.

BACKGROUND

Modern processing systems are able to handle large amounts of data. Theprocessing system's ability to transmit such data is typically limited.

Often the processing ability of such systems is increased by adding moreprocessors. Sometimes tasks are partitioned among processors. Thesetasks may be performed by various processors in parallel, that is, viaparallel processing among processors or among processor groups.

These processor or processor groups often are required to communicatewith one another. In order to effectively communicate, these processorsmay choose to send data back and forth. If one processor that isexecuting a task is operating in parallel with another processor, thefirst processor may have to wait for some information or data before itcan perform or execute its task. Data exchange therefore becomescritical to the efficient operation of the processors.

Parallel signal processing is often useful in situations where complexsignaling arrangements require fast signal processing and signalconversions that must be performed much faster than software is able todo, for example. This signal processing and conversion may be highlycomputational in nature.

DESCRIPTION OF THE DRAWINGS

FIG. 1 is a block diagram of a layout of a system on a chip embodying adata transfer arrangement in accordance with an embodiment of thepresent invention.

FIG. 2 is a block diagram of an embodiment of a data transferarrangement depicting the arrangement of the fabric exchange element(FEE) of FIG. 1.

FIG. 3 is a block diagram of an application of a data transferarrangement in accordance with an embodiment of the present invention.

FIG. 4 is a flow chart of a method for data transfer in accordance withan embodiment of the present invention.

DETAILED DESCRIPTION

FIG. 1 is a block diagram of a layout of a system on a chip (SoC) 10embodying a data transfer arrangement of the present invention, as anexample. SoC 10, for example, includes three structures, an FPGA fabric14, a parallel processing element (PE) fabric 16 and a fabric exchangeelement (FEE) 15 coupling the FPGA fabric 14 and the PE fabric 16.

The FPGA fabric 14 may be coupled to an array of macro cells (MCs) 13 ofsize measuring R×S operating as logic elements, where “R” and “S” arepositive integers. This ordering of the macro cells 13 may be an array,as shown in FIG. 1. The FPGA fabric 14 may be an interconnectimplementation that provides connectivity in one or more ways betweenmacro cells 13.

The PE fabric 16 may include an array processing elements (PEs) 17 ofsize measuring K×L, where “K” and “L” are positive integers. The PEs 17and MCs 13 may exchange data to perform cohesive operations, such assignal processing or signal conversion. The fabric exchange element 15renders this data transfer or exchange possible. Processing elements 17may transmit or send data to one or more macro cells 13. Macro cells 13may also transmit or send data to one or more processing elements 17.

The processing element fabric 16 and each of the PEs 17 may have aclocking arrangement, bus width, layout and topology very different fromthe MCs 13 of the FPGA fabric 14. The fabric exchange element (FEE) 15smoothly interfaces the processing element fabric 16 with the FPGAfabric 14. Specifically, the FEE 15 compensates for differing clockingarrangements, bus widths, layout and topology between the PEs 17 and theMCs 13 as may be implemented by each of their respective fabrics 16 and14.

For example, if the busing arrangement of the MCs 13 is a 4-bit bus andthe busing arrangement of the PEs 17 is an 8-bit bus, the FEE 15separates the 8-bit bus of the PEs into two 4-bit buses in order toexchange or transfer data to achieve compatibility between the twofabrics, in this case the FPGA 13 and processing element 17. Similarly,the FEE 15 may account for different clocking arrangements, accessingarrangements and multiple PE 17 or MC 13 element access by a single MC13 or PE 17.

As a result, the FEE 15 allows the two fabrics, PEs 17 and MCs 13 tooperate independently. Further, the two fabrics may operateasynchronously. The FEE 15 allows any source (PE or MC) node coupled tothe FEE 15 to send or transmit data to any destination (MC or PE) nodecoupled to the FEE 15. Thus exchange of data between the fabrics 14 and16 is facilitated.

In another embodiment, SoC 10 may comprise a semiconductor device. FPGAfabric 14, FEE 15 and PE fabric 16 may be implemented on one or moresemiconductor devices as fabricated by various technologies known tothose of ordinary skill in the art. In yet another embodiment, the FPGAfabric 14, FEE 15 and PE fabric 16 may be implemented discretely.

FIG. 2 is a block diagram of an embodiment of a data transferarrangement depicting the arrangement of the fabric exchange element(FEE) 15 of FIG. 1. FEE 15 may include a clock 20. Clock 20 may beindependent of a clock of the FPGA fabric 14 or the PE fabric 16. Theclock 20 provides a synchronous clock signal to each shift register31-35 of the plurality of N-bit registers and to each shift register41-45 of the plurality of N-bit registers. N-bits indicate that variousregister widths are within the contemplation of this arrangement. Theselection of a specific N is left to a system designer.

N-bit registers 31-35 couple the FPGA fabric 14 to the PE fabric 16 ofFIG. 1. Similarly, N-bit registers 41-45 couple the PE fabric 16 to theFPGA fabric 14. More specifically, each N-bit register 31-35 is coupledto an MC 13, and each N-bit register 41-45 is coupled to a PE 17.

Each register 31-35 and 41-45 within the FEE 15 has a C-bit 51-55, 61-65that indicates when data is available for communicating between eachshift register coupled to a PE 17 and each shift register coupled to anMC 13. Circular shift register 50 has a first portion of shift registers(31-35) and a second portion of shift registers (41-45), each shiftregister coupled in a circular arrangement.

Each N-bit register 31-35 and 41-45 may include a shift register,buffer, FIFO (first-in-first-out) device, a read/write memory device, orany kind of parallel (N-bit) store-and-forward arrangement that may becoupled in a circular fashion.

In an embodiment of the present invention depicted in FIG. 2, registers31-35 and 41-46 may be shift registers, although as mentioned aboveother such read/write store-and-forward devices may be utilized. Shiftregister 35 is coupled N-bits in parallel to shift register 34; shiftregister 34 is coupled N-bits in parallel to shift register 33; shiftregister 33 is coupled N-bits in parallel to shift register 32; andshift register 32 is coupled N-bits in parallel to shift register 31.When a clock signal from clock 20 is applied to shift registers 31-35,each shift register transfers in parallel its N-bit wide data to theshift register above it in the flow of the arrows. For example, shiftregister 32 transfers its data contents to shift register 31. Each ofthe other shift registers 32-35 performs in a similar manner.

Shift register 41 is coupled N-bits in parallel to shift register 42;shift register 42 is coupled N-bits in parallel to shift register 43;shift register 43 is coupled N-bits in parallel to shift register 44;and shift register 44 is coupled N-bits in parallel to shift register45. When a clock signal from clock 20 is applied to shift registers41-45, each shift register transfers in parallel its N-bit wide data tothe shift register below it in the flow of the arrows. For example,shift register 41 transfers its data to shift register 42. Each of theother shift registers 42-45 performs in a similar manner.

A limited number of shift registers is shown in FIG. 2, by way ofexample and not of limitation; however, many more shift registers may beincluded.

The first shift register 31 in the portion of shift registers 31-35 iscoupled N-bits in parallel to the last shift register 41 in anotherportion of shift registers 41-45. As a result, when clock 20 sends aclock signal to shift register 31, shift register 31 transfers itsN-bits of data in parallel to shift register 41. Note that data from theMC 13 to which the shift register 31 is coupled now may be transferredto shift register 41, so that the data can be accessed or read out by aPE 17 to which the shift register 41 is coupled.

The first shift register 45 in the portion of shift registers 41-45 iscoupled N-bits in parallel to the last shift register 35 in the otherportion of shift registers 31-35. As a result, when clock 20 sends aclock signal to shift register 45, shift register 45 transfers itsN-bits of data in parallel to shift register 31 of the other portion ofshift registers. Note that data from the PE 17 to which the shiftregister 45 is coupled may now be transferred to shift register 35, sothat the data can be accessed or read out by a MC 13 to which the shiftregister 35 is coupled.

Shift registers 31-35 and 41-45 may be viewed as a circular shiftregister of N-bits in width. Additionally, the circular shift registermay be viewed as a “wheel” that turns incrementally, thereby moving datafrom one shift register to another on the “wheel”. As the “wheel” turnson each clock cycle, the PE 17 or MC 13 corresponding to a shiftregister may access or read out the data of its corresponding shiftregister for use.

Each register 31-35 and 41-45 within the FEE 15 has a C-bit 51-55, 61-65that indicates when data is available for communicating between eachshift register coupled to a PE 17 and each shift register coupled to anMC 13. Each of shift registers 31-35 and 41-45 corresponds to acorresponding C-bit associated with the particular shift register. TheC-bit indicates that the loading with data of the corresponding shiftregister has been completed. Either a PE 17 or an MC 13 will load itscorresponding shift register with data, if the PE 17 or MC 13 has datato transfer to one or more of the other MCs 13 or PEs 17.

When the SoC system 10 is initialized, each of the C-bits 51-55 and61-65 is reset and cleared. Next, each SoC node, whether PE 17 or MC 13,may load its corresponding shift register with data to be transferred tothe other portion of the fabric. That is, data may be sent from the FPGAfabric 14 to the PE fabric 16 and vice versa. Some SoC nodes may havedata to load into the corresponding shift register, and some may not. Inany event, each SoC node that is coupled to the FEE 15 sets thecorresponding C-bit to its shift register when it has completed itstransfer of data to the corresponding shift register. If the SoC nodehas no data to transfer this cycle, the node also sets the C-bit.

When all the C-bits are set, the circular shift register or “wheel”begins to shift or turn incrementally. The rotation in FIG. 2 is shownas clockwise, although either clockwise or counter-clockwise rotation iscontemplated by the present arrangement. Each SoC node may thenoptionally read out the data of the corresponding shift register, sothat in M−1 clock cycles each SoC node that is coupled to FEE 15 has hadaccess to the data of each of the other SoC coupled nodes. M is definedas the number of PEs 17 plus the number of MCs 13 that are coupled toFEE 15.

In M clock cycles, the data is back where it originated, and the “wheel”may be thought of as having made a complete turn. After such a completeturn, each of the C-bits is reset, and the corresponding shift registersare cleared. Then the shift registers 31-35 and 41-45 may be re-loadedwith data; C-bits are set; and the clock again causes the “wheel” toturn.

As can be seen from the above explanation, the circular register ofshift registers 31-35 and 41-45 facilitates data transfer between theFPGA fabric 14 and the PE fabric 16 while rendering transparent datatransfer bit width, asynchronous operation, busing and layout of SoCnodes.

FIG. 3 is a block diagram of an application of a data transferarrangement in accordance with an embodiment of the present invention.FIG. 3 is by way of example and does not form any limitation of thescope or applicability of the present data transfer arrangement.

Processing arrangement 100 shows an RF (radio frequency) transmissionscheme. Antenna 110 receives incoming or transmits outgoing radiofrequency signals and/or data. Antenna 110 may include a directional oromni-directional antenna, including, for example, a dipole antenna, amonopole antenna, a patch antenna, a loop antenna, a microstrip antennaor other type of antenna suitable for reception and/or transmission ofdata signals. Signals received by antenna 110 are transferred to radiofrequency subsystem 120. These received signals are then converted fromanalog to digital by converter 115. Outgoing signals transmitted by theantenna 110 are converted from digital to analog by converter 115.

These converted signals are then passed to SoC arrangement 10 forprocessing. System on a Chip 10 converts and processes the data as fastas may be required by any radio frequency application. System on a Chip10 in an embodiment may utilize a register arrangement, FIFO(first-in-first-out) device, a read/write memory device, buffer, or anykind of parallel (N-bit) store-and-forward arrangement. After the datais processed by System on a Chip 10, the data is forwarded to controlprocessor 130 and on to the network.

Similarly, data from the network is sent to control processor 130 ofSystem on a Chip 10 and processed, as required. The data is then sentthrough RF subsystem 120, including converter 115, and to antenna 110for wireless transmission.

As a result of the above-described processing, multiple fabrics of theSoC device or arrangement 10 are easily interfaced. Further, SoC deviceor arrangement 10 may be implemented on a semiconductor chip 12, such asa System on a Chip 10. System on a Chip arrangement 10 in otherembodiments may include a “chip-set”. Further the SoC arrangement 10 maybe implemented discretely with individual devices.

The SoC device or arrangement 10, in other embodiments, may include afirst fabric 14 of elements 13. The elements 13 of first fabric 14 maybe logic elements 13 as mentioned above. The SoC device or arrangement10 may include another fabric 16 of processing elements 17 that requiredata exchange with the first fabric 14. The processing elements 17 mayinclude processing elements 17 that require data for processing that arecontained in the logic elements 13.

To facilitate the transfer of data between the first fabric 14 and theother fabric 17, a data exchanger or fabric exchange element 15transfers data between the two fabrics 14 and 16. The data exchanger 15accounts for the processing differences in the two fabrics 14 and 16while transferring the data in parallel in a timely fashion. Forexample, these differences may include, but are not limited to, bus sizeand clock speed.

As mentioned above, first fabric 14 may include a plurality of logicelements 13. The other fabric 17 may include a plurality of processingelements 17.

The data exchanger 15, in an embodiment, can include an N-bitstore-and-forward such as a memory, buffer, shift register, or afirst-in-first-out device. The store-and-forward device, in someembodiments, includes a plurality of shift registers coupled in acircular arrangement 50 to transfer data to each of the plurality ofshift registers 31-35, 41-45. A clock 20, in an embodiment, is coupledto each of the plurality of shift registers 31-35, 41-45 to enable thedata to be circulated to each of the plurality of shift registers 31-35,41-45.

Each shift register 31 of the corresponding plurality of shift registers31-35, 41-45 has a corresponding bit 51-55, 61-65. Each correspondingbit 51-55, 61-65 indicates that the loading of data into thecorresponding shift register 31-35, 41-45 has been completed.

As mentioned above, the semiconductor device 12 may include the Systemon a Chip 10. The System on a Chip, in an embodiment, may include thetwo fabrics 14 and 16 and the data exchanger 15.

FIG. 4 is a flow chart of a method 400 for data transfer in accordancewith an embodiment of the present invention. The method 400 is started,and block 402 is entered. Each element 13 and 17 of each fabric 12 and16, respectively, may load or not load the fabric exchange element 15with data, block 402. When the loading of each shift register 31-35,41-45 is completed, in an embodiment, a corresponding C-bit for eachshift register 31-35, 41-45 is set, block 404. Even though a shiftregister 31 has no data, a corresponding C-bit 51 is set, for example.

Block 406 determines when all the C-bits 51-55, 61-65 are set andbegins, under control of a clock 20, to cause the plurality of shiftregisters 31-35, 41-45 to incrementally and circularly rotate the datafrom one shift register 31 to another shift register 41. Block 408determines if any fabric element 13 or 17 requires the data now in thecorresponding shift register, as the data circularly rotates. If anyfabric element 13 or 17 requires the data, block 408 transfers controlto block 410 via the YES path. The element 13 or 17 may read the dataout of the corresponding shift register, block 406. Block 410 thentransfers control to block 412.

If the element 13 or 17 does not require the data presently in itscorresponding shift register 31-35, 41-45, then block 408 transferscontrol to block 412 via the NO path.

The method 400 continues to incrementally and circularly rotate the datathrough the plurality of circularly-coupled shift registers 31-35,41-45, block 412. This incremental and circular rotation of the data iscontrolled by the clock 20. Block 414 determines whether a completerotation of the data through all of the plurality of shift registers31-35, 41-45 is completed. If the data has not been rotated through allthe shift registers 31-35, 41-45, block 414 transfers control to block412 via the NO path to continue the incremental and circular rotation ofthe data through each of the plurality of shift registers 31-35, 41-45.

If block 414 determines that a complete rotation of the data througheach of the plurality of shift registers 31-35, 41-45 is completed,then, in an embodiment, block 414 transfers control to block 416 via theYES path.

The method, in block 416, clears all the C-bits 51-55, 61-65 of thecorresponding shift registers 31-35, 41-45. All of the shift registers31-35, 41-45 are cleared in block 418. Block 418 transfers control toblock 402 to begin method 400 again for the transfer of data betweendifferent fabrics 14 and 16.

The method 400 allows for the efficient transfer of data among differentfabrics of elements.

The description and the drawings illustrate specific embodiments of theinvention sufficiently to enable those skilled in the art to practicethem. Examples merely typify possible variations. Portions and featuresof some embodiments may be included in or substituted for those ofothers.

Although some embodiments of the invention have been illustrated anddescribed in detail, it will be readily apparent to those skilled in theart that various modifications may be made therein without departingfrom the spirit of these embodiments or from the scope of the appendedclaims.

1. A data transfer arrangement comprising: a plurality of logicelements; a plurality of processing elements; and a transfer element totransfer data in parallel between each logic element of the plurality oflogic elements and each processing element of the plurality ofprocessing elements.
 2. The data transfer arrangement as claimed inclaim 1, the plurality of logic elements including a plurality of macrocells.
 3. The data transfer arrangement as claimed in claim 2, thetransfer element including a circular shift register.
 4. The datatransfer arrangement as claimed in claim 3, the circular shift registerfurther including a plurality of shift registers coupled in parallel ina circular arrangement.
 5. The data transfer arrangement as claimed inclaim 4, each shift register of the plurality of shift registersincluding a shift register of N-bits.
 6. The data transfer arrangementas claimed in claim 5, each shift register of the plurality of shiftregisters having a corresponding shift register and having acorresponding bit associated with the corresponding shift register, thecorresponding bit indicating that the loading of the data into thecorresponding shift register has been completed.
 7. The data transferarrangement as claimed in claim 5, the plurality of shift registersincluding: a first portion of the plurality of shift registers, thefirst portion being coupled to the plurality of macro cells; and asecond portion of the plurality of shift registers, the second portionbeing coupled to the plurality of processing elements.
 8. The datatransfer arrangement as claimed in claim 7, further including a firstshift register of the second portion of shift registers coupled to alast shift register of the first portion of shift registers.
 9. The datatransfer arrangement as claimed in claim 7, further including a firstshift register of the first portion of shift registers coupled to a lastshift register of the second portion of shift registers.
 10. The datatransfer arrangement as claimed in claim 7, the first and the secondportions of the plurality of shift registers including a plurality ofshift registers of N-bits.
 11. The data transfer arrangement as claimedin claim 10, further including a clock to produce clock cycles, theclock coupled to the first portion and to the second portion of theplurality of shift registers, each shift register to transmit N-bits ofdata to a next shift register in the circular shift register during aclock cycle.
 12. The data transfer arrangement as claimed in claim 11,the clock operating to clear the corresponding bit for each of thecorresponding shift registers of the plurality of shift registers, afterM clock cycles, where M is a number of processing elements plus macrocells being coupled to the transfer element.
 13. The data transferarrangement as claimed in claim 1, wherein there is further included aSystem on a Chip, the System on a Chip including: the plurality of logicelements; the plurality of processing elements; and the transferelement.
 14. A semiconductor device comprising: a first fabric ofelements; a second fabric of elements; and a data exchanger to transferdata in parallel between the first fabric of elements and the secondfabric of elements, the first fabric and the second fabric being dataincompatible.
 15. The semiconductor device as claimed in claim 14, thefirst fabric of elements including a plurality of logic elements. 16.The semiconductor device as claimed in claim 14, the second fabric ofelements including a plurality of processing elements
 17. Thesemiconductor device as claimed in claim 14, the data exchangerincluding an N-bit parallel store-and-forward device.
 18. Thesemiconductor device as claimed in claim 17, the N-bit store-and-forwarddevice including a plurality of shift registers coupled in a circulararrangement.
 19. The semiconductor device as claimed in claim 18, eachshift register of the plurality of shift registers having acorresponding shift register and having a corresponding bit associatedwith the corresponding shift register, the corresponding bit indicatingthat a loading of the data into the corresponding shift register hasbeen completed.
 20. The semiconductor device as claimed in claim 14,wherein the semiconductor device further includes a System on a Chip,the System on a Chip including: the first fabric of elements; the secondfabric of elements; and the data exchanger.
 21. A method comprising:loading data by a first fabric of elements and a second fabric ofelements into a store-and-forward device; circularly transferring thedata through the store-and-forward device to allow each element of thefirst and second fabrics to access the data of other elements of thefirst and second fabrics; and reading the data from thestore-and-forward device by an element from the other elements of thefirst and second fabrics.
 22. The method of claim 21, further includingwhen the loading of each of a plurality of shift registers of thestore-and-forward device is completed, setting a C-bit corresponding toeach of the plurality of shift registers.
 23. The method of claim 22,further including when the C-bit of each of the corresponding pluralityof shift registers is set, performing circularly transferring the dataincrementally.
 24. The method of claim 23, further including:determining by the elements if the data is needed by the element; and ifthe data is needed, performing the reading the data.
 25. The method ofclaim 24, further including determining if the circular transferring ofthe data is completed.
 26. The method of claim 25, further including ifa circular transferring of the data is complete: clearing each C-bit;and clearing the data from each of the plurality of shift registers. 27.A system comprising: a System on a Chip including a plurality of logicelements; a plurality of processing elements; a transfer element totransfer data in parallel between each logic element of the plurality oflogic elements and each processing element of the plurality ofprocessing elements, the transfer element to produce converted data; aradio-frequency subsystem coupled to the System on a Chip; and anomni-directional antenna coupled to the radio-frequency subsystem. 28.The system as claimed in claim 27, wherein the transfer element includesa circular shift register.
 29. The system as claimed in claim 28,wherein the circular shift register includes a plurality of shiftregisters, each of the plurality of shift registers having acorresponding bit to indicate that a loading of a data into acorresponding shift register has been completed.
 30. The system asclaimed in claim 29, wherein the plurality of shift registers includes:a first portion of the plurality of shift registers coupled to theplurality of logic elements; a second portion of the plurality of shiftregisters coupled to the plurality of processing elements; and a clockto produce clock cycles, the clock coupled to the first portion and tothe second portion of the plurality of shift registers, each shiftregister to transmit N-bits of data to a next shift register in thecircular shift register during a clock cycle.